257 research outputs found

    Alignment of spanish and english TREC topic descriptions

    Get PDF
    A technique is described for aligning TREC topic descriptions that is capable of producing a small multilingual test collection which can be used for cross-language ad-hoc and routing evaluations. Methods for measuring the degree of degradation induced by the necessary approximations are described and illustrated using examples from an evaluation of two cross-language routing techniques. Although the experiments were conducted on a relatively small test collection using existing TREC relevance judgments, the results suggest that cross-language routing is practical and that the investment required to produce a truly multilingual test collection for the TREC multilingual track would be justi ed.

    Bridging communities of practice: Emerging technologies for content-centered linking

    Get PDF
    The project fosters convergence between two communities by addressing complementary aspects of a shared opportunity. Digital humanists are at the forefront of developing ways to render cultural heritage metadata increasingly interoperable as linked open data in tandem with information professionals working in libraries, archives, and museums. Computer scientists are developing automated techniques for extracting linkable data from the content itself. Bringing these communities together offers transformational potential for the application of a critical infrastructure in humanities scholarship. Two workshops will be organized to seize this unique opportunity. The first will bring together humanities scholars and computer scientists to explore applications of new content linking technologies to dispersed and disparate material. In the second, a larger group of humanities scholars will identify specific content to which techniques described in the previous workshop will be applied

    Known by the Company it Keeps: Proximity-Based Indexing for Physical Content in Archival Repositories

    Full text link
    Despite the plethora of born-digital content, vast troves of important content remain accessible only on physical media such as paper or microfilm. The traditional approach to indexing undigitized content is using manually created metadata that describes content at some level of aggregation (e.g., folder, box, or collection). Searchers led in this way to some subset of the content often must then manually examine substantial quantities of physical media to find what they are looking for. This paper proposes a complementary approach, in which selective digitization of a small portion of the content is used as a basis for proximity-based indexing as a way of bringing the user closer to the specific content for which they are looking. Experiments with 35 boxes of partially digitized US State Department records indicate that box-level indexes built in this way can provide a useful basis for search

    It’s About Time: Projecting Temporal Metadata for Historically Significant Recordings

    Get PDF
    Twentieth century audio recordings and motion pictures are important sources, both for scholarly analysis and for public history. In some cases, important metadata has not reached the collecting institutions along with the materials, which are now in need of richer description. This paper describes a novel technique for determining the date and time on which a recording was made based on analysis of incidentally captured traces of small variations in the electric power supply at the time the recording was made

    Investigating cross-language speech retrieval for a spontaneous conversational speech collection

    Get PDF
    Cross-language retrieval of spontaneous speech combines the challenges of working with noisy automated transcription and language translation. The CLEF 2005 Cross-Language Speech Retrieval (CL-SR) task provides a standard test collection to investigate these challenges. We show that we can improve retrieval performance: by careful selection of the term weighting scheme; by decomposing automated transcripts into phonetic substrings to help ameliorate transcription errors; and by combining automatic transcriptions with manually-assigned metadata. We further show that topic translation with online machine translation resources yields effective CL-SR

    NTCIR CLIR Experiments at the University of Maryland

    Get PDF
    This paper presents results for the Japanese/English cross-language information retrieval task on the NACSIS Test Collection. Two automatic dictionarybased query translation techniques were tried with four variants of the queries. The results indicate that longer queries outperform the required descriptiononly queries and that use of the rst translation in the edict dictionary is comparable with the use of every translation. Japanese term segmentation posed no unusual problems, which contrasts sharply with results previously obtained for cross-language retrieval between Chinese and English.

    Searching Spontaneous Conversational Speech

    Get PDF
    The ACM SIGIR Workshop on Searching Spontaneous Conversational Speech was held as part of the 2007 ACM SIGIR Conference in Amsterdam.\ud The workshop program was a mix of elements, including a keynote speech, paper presentations and panel discussions. This brief report describes the organization of this workshop and summarizes the discussions

    Structured Translation for Cross-Language Information Retrieval

    Get PDF
    The paper introduces a query translation model that re ects the structure of the cross-language information retrieval task. The model is based on a structured bilingual dictionary in which the translations of each term are clustered into groups with distinct meanings. Query translation is modeled as a two-stage process, with the system rst determining the intended meaning of a query term and then selecting translations appropriate to that meaning that might appear in the document collection. An implementation of structured translation based on automatic dictionary clustering is described and evaluated by using Chinese queries to retrieve English documents. Structured translation achieved an average precision that was statistically indistinguishable from Pirkola's technique for very short queries, but Pirkola's technique outperformed structured translation on long queries. The paper concludes with some observations on future work to improve retrieval e ectiveness and on other potential uses of structured translation in interactive cross-language retrieval applications. 1
    corecore